distributional semantic
Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture
This paper presents a mathematical interpretation of self-attention by connecting it to distributional semantics principles. We show that self-attention emerges from projecting corpus-level co-occurrence statistics into sequence context. Starting from the co-occurrence matrix underlying GloVe embeddings, we demonstrate how the projection naturally captures contextual influence, with the query-key-value mechanism arising as the natural asymmetric extension for modeling directional relationships. Positional encodings and multi-head attention then follow as structured refinements of this same projection principle. Our analysis demonstrates that the Transformer architecture's particular algebraic form follows from these projection principles rather than being an arbitrary design choice.
Do Large Language Models Defend Inferentialist Semantics?: On the Logical Expressivism and Anti-Representationalism of LLMs
The philosophy of language, which has historically been developed through an anthropocentric lens, is now being forced to move towards post-anthropocentrism due to the advent of large language models (LLMs) like ChatGPT (OpenAI), Claude (Anthropic), which are considered to possess linguistic abilities comparable to those of humans. Traditionally, LLMs have been explained through distributional semantics as their foundational semantics. However, recent research is exploring alternative foundational semantics beyond distributional semantics. This paper proposes Robert Brandom's inferentialist semantics as an suitable foundational semantics for LLMs, specifically focusing on the issue of linguistic representationalism within this post-anthropocentric trend. Here, we show that the anti-representationalism and logical expressivism of inferential semantics, as well as quasi-compositionality, are useful in interpreting the characteristics and behaviors of LLMs. Further, we propose a \emph{consensus theory of truths} for LLMs. This paper argues that the characteristics of LLMs challenge mainstream assumptions in philosophy of language, such as semantic externalism and compositionality. We believe the argument in this paper leads to a re-evaluation of anti\hyphen{}representationalist views of language, potentially leading to new developments in the philosophy of language.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
- (14 more...)
- Research Report (1.00)
- Overview (0.93)
Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers
Enyan, Zhang, Wang, Zewei, Lepori, Michael A., Pavlick, Ellie, Aparicio, Helena
Distributional semantics is the linguistic theory that a word's meaning can be derived from its distribution in natural language (i.e., its use). Language models are commonly viewed as an implementation of distributional semantics, as they are optimized to capture the statistical features of natural language. It is often argued that distributional semantics models should excel at capturing graded/vague meaning based on linguistic conventions, but struggle with truth-conditional reasoning and symbolic processing. We evaluate this claim with a case study on vague (e.g. "many") and exact (e.g. "more than half") quantifiers. Contrary to expectations, we find that, across a broad range of models of various types, LLMs align more closely with human judgements on exact quantifiers versus vague ones. These findings call for a re-evaluation of the assumptions underpinning what distributional semantics models are, as well as what they can capture.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (2 more...)
Constructive Approach to Bidirectional Causation between Qualia Structure and Language Emergence
Taniguchi, Tadahiro, Oizumi, Masafumi, Saji, Noburo, Horii, Takato, Tsuchiya, Naotsugu
This paper presents a novel perspective on the bidirectional causation between language emergence and relational structure of subjective experiences, termed qualia structure, and lays out the constructive approach to the intricate dependency between the two. We hypothesize that languages with distributional semantics, e.g., syntactic-semantic structures, may have emerged through the process of aligning internal representations among individuals, and such alignment of internal representations facilitates more structured language. This mutual dependency is suggested by the recent advancements in AI and symbol emergence robotics, and collective predictive coding (CPC) hypothesis, in particular. Computational studies show that neural network-based language models form systematically structured internal representations, and multimodal language models can share representations between language and perceptual information. This perspective suggests that language emergence serves not only as a mechanism creating a communication tool but also as a mechanism for allowing people to realize shared understanding of qualitative experiences. The paper discusses the implications of this bidirectional causation in the context of consciousness studies, linguistics, and cognitive science, and outlines future constructive research directions to further explore this dynamic relationship between language emergence and qualia structure.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- (7 more...)
An Essay concerning machine understanding
Herbert L. Roitblat ABSTRACT Artificial intelligence systems exhibit many useful capabilities, but they appear to lack understanding. This essay describes how we could go about constructing a machine capable of understanding. As John Locke (1689) pointed out words are signs for ideas, which we can paraphrase as thoughts and concepts. To understand a word is to know and be able to work with the underlying concepts for which it is an indicator. Understanding between a speaker and a listener occurs when the speaker casts his or her concepts into words and the listener recovers approximately those same concepts. Current models rely on the listener to construct any potential meaning. The diminution of behaviorism as a psychological paradigm and the rise of cognitivism provide examples of many experimental methods that can be used to determine whether and to what extent a machine might understand and to make suggestions about how that understanding might be instantiated. I know there are not words enough in any language to answer all the variety of ideas that enter into men's discourses and reasonings. But this hinders not but that when any one uses any term, he may have in his mind a determined idea, which he makes it the sign of, and to which he should keep it steadily annexed during that present discourse. John Locke 1689 Artificial intelligence systems exhibit many useful capabilities, but as has often been said, they lack "understanding," which would be a critical capability for general intelligence. The transformer architecture on which current systems are based takes one string of tokens and produces another string of tokens (one token at a time) based on the aggregated statistics of the associations among tokens. The representations mediating between the inputs (e.g., prompts) and their production is one purely of the statistical relations among the word tokens. In the case of large language models, we know these facts to be true because this is how the models were designed and they were trained on a kind of fill-in-the-blank test to guess the next word. What exactly would it mean for an artificial intelligence system to understand? How would we know that it does?
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Reflectance Estimation for Proximity Sensing by Vision-Language Models: Utilizing Distributional Semantics for Low-Level Cognition in Robotics
Osada, Masashi, Ricardez, Gustavo A. Garcia, Suzuki, Yosuke, Taniguchi, Tadahiro
Large language models (LLMs) and vision-language models (VLMs) have been increasingly used in robotics for high-level cognition, but their use for low-level cognition, such as interpreting sensor information, remains underexplored. In robotic grasping, estimating the reflectance of objects is crucial for successful grasping, as it significantly impacts the distance measured by proximity sensors. We investigate whether LLMs can estimate reflectance from object names alone, leveraging the embedded human knowledge in distributional semantics, and if the latent structure of language in VLMs positively affects image-based reflectance estimation. In this paper, we verify that 1) LLMs such as GPT-3.5 and GPT-4 can estimate an object's reflectance using only text as input; and 2) VLMs such as CLIP can increase their generalization capabilities in reflectance estimation from images. Our experiments show that GPT-4 can estimate an object's reflectance using only text input with a mean error of 14.7%, lower than the image-only ResNet. Moreover, CLIP achieved the lowest mean error of 11.8%, while GPT-3.5 obtained a competitive 19.9% compared to ResNet's 17.8%. These results suggest that the distributional semantics in LLMs and VLMs increases their generalization capabilities, and the knowledge acquired by VLMs benefits from the latent structure of language.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (3 more...)
Domain Embeddings for Generating Complex Descriptions of Concepts in Italian Language
In this work, we propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries, designed to address the challenge of bridging the gap between the continuous semantic values represented by distributional vectors and the discrete descriptions offered by general semantics theory. Recently, many researchers have concentrated on the nexus between embeddings and a comprehensive theory of semantics and meaning. This often involves decoding the representation of word meanings in Distributional Models into a set of discrete, manually constructed properties such as semantic primitives or features, using neural decoding techniques. Our approach introduces an alternative strategy grounded in linguistic data. We have developed a collection of domain-specific co-occurrence matrices, derived from two sources: a classification of Italian nouns categorized into 4 semantic traits and 20 concrete noun sub-categories, and a list of Italian verbs classified according to their semantic classes. In these matrices, the co-occurrence values for each word are calculated exclusively with a defined set of words pertinent to a particular lexical domain. The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface. Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge, such as a matrix based on location nouns and the concept of animal habitats. We assessed the utility of the resource through two experiments, achieving promising outcomes in both: the automatic classification of animal nouns and the extraction of animal features.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Africa (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (7 more...)
- Health & Medicine (1.00)
- Transportation (0.93)
- Materials > Chemicals (0.45)
Grounded learning for compositional vector semantics
Categorical compositional distributional semantics is an approach to modelling language that combines the success of vector-based models of meaning with the compositional power of formal semantics. However, this approach was developed without an eye to cognitive plausibility. Vector representations of concepts and concept binding are also of interest in cognitive science, and have been proposed as a way of representing concepts within a biologically plausible spiking neural network. This work proposes a way for compositional distributional semantics to be implemented within a spiking neural network architecture, with the potential to address problems in concept binding, and give a small implementation. We also describe a means of training word representations using labelled images.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (6 more...)
Contextualized word senses: from attention to compositionality
The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very fruitful, they continue to be models with little or no interpretability and explainability. One of the tasks for which they are best suited is the encoding of the contextual sense of words using contextualized embeddings. In this paper we propose a transparent, interpretable, and linguistically motivated strategy for encoding the contextual sense of words by modeling semantic compositionality. Particular attention is given to dependency relations and semantic notions such as selection preferences and paradigmatic classes. A partial implementation of the proposed model is carried out and compared with Transformer-based architectures for a given semantic task, namely the similarity calculation of word senses in context. The results obtained show that it is possible to be competitive with linguistically motivated models instead of using the black boxes underlying complex neural architectures.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
- (9 more...)
Investigating Antigram Behaviour using Distributional Semantics
The field of computational linguistics constantly presents new challenges and topics for research. Whether it be analyzing word usage changes over time or identifying relationships between pairs of seemingly unrelated words. To this point, we identify Anagrams and Antigrams as words possessing such unique properties. The presented work is an exploration into generating anagrams from a given word and determining whether there exists antigram (semantically opposite anagrams) relationships between the pairs of generated anagrams using GloVe embeddings. We propose a rudimentary, yet interpretable, rule-based algorithm for detecting antigrams. On a small dataset of just 12 antigrams, our approach yielded an accuracy of 39\% which shows that there is much work left to be done in this space.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Arizona (0.04)